NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness

https://doi.org/10.18653/v1/2025.naacl-long.27

Sung, Yoo Yeon; Gor, Maharshi; Fleisig, Eve; Mondal, Ishani; Boyd-Graber, Jordan Lee (January 2025, Association for Computational Linguistics)

Full Text Available
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA

https://doi.org/10.18653/v1/2024.emnlp-main.1201

Gor, Maharshi; Daumé_Iii, Hal; Zhou, Tianyi; Boyd-Graber, Jordan Lee (January 2024, Association for Computational Linguistics)

CAIMIRA discovers the skills that humans and AIs use to answer questions. By scraping websites where trivia nerds answer really difficult questions and posing those questions to AI models like GPT-4 and LLaMA-3-70B, while humans excel in knowledge-based abductive reasoning, AI outperforms on fact-based historical recall. This research suggests future challenges should focus on more complex reasoning and nuanced language tasks to better align AI development with human cognitive strengths.
more » « less
Full Text Available
PEDANTS: Cheap but Effective and Interpretable Answer Equivalence

https://doi.org/10.18653/v1/2024.findings-emnlp.548

Li, Zongxia; Mondal, Ishani; Nghiem, Huy; Liang, Yijun; Boyd-Graber, Jordan Lee (January 2024, Association for Computational Linguistics)

Full Text Available
You Make me Feel like a Natural Question: Training QA Systems on Transformed Trivia Questions

https://doi.org/10.18653/v1/2024.emnlp-main.1140

Kabir, Tasnim; Sung, Yoo Yeon; Bandyopadhyay, Saptarashmi; Zou, Hao; Chandra, Abhranil; Boyd-Graber, Jordan Lee (January 2024, Association for Computational Linguistics)

Many of the questions for training AIs how to answer questions come from the queries users type into search engines (like Google's Natural Questions). Is there a cheaper---perhaps even better---way? We propose a "naturalization" technique to turn high-quality, rigorously edited trivia questions into examples that resemble Natural Questions. Training on our naturalized questions and testing on natural questions comes close to the results with using Natural Questions, and we can improve results on MMLU (a standard modern evaluation set) by using our data.
more » « less
Full Text Available
A SMART Mnemonic Sounds like “Glue Tonic”: Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick

https://doi.org/10.18653/v1/2024.emnlp-main.786

Balepur, Nishant; Shu, Matthew; Hoyle, Alexander; Robey, Alison; Feng, Shi; Goldfarb-Tarrant, Seraphina; Boyd-Graber, Jordan Lee (January 2024, Association for Computational Linguistics)

Learning vocabulary (e.g., benevolent) can be tedious, but using mnemonics (e.g., benevolent sounds like "benefits," and a kind boss gives benefits) makes it more engaging and effective. This paper introduces SMART, a large language model trained to produce mnemonics based on feedback from flashcard learners. Students struggle to predict which mnemonics will help them most. Still, by training SMART on both student preferences and learning outcomes, we can generate mnemonics as effectively as GPT-4, but at a much lower cost.
more » « less
Full Text Available

Search for: All records